Skip to content

Fix: /chat/completions logs status 500 even when the wire status is 404/503#2

Draft
mimeding wants to merge 2 commits into
mainfrom
cursor/fix-chat-completions-error-log-status-2812
Draft

Fix: /chat/completions logs status 500 even when the wire status is 404/503#2
mimeding wants to merge 2 commits into
mainfrom
cursor/fix-chat-completions-error-log-status-2812

Conversation

@mimeding

Copy link
Copy Markdown
Owner

Summary

Why this matters (business)

The Insights tab and any future log-driven dashboards are how operators tell "is my server healthy?" apart from "did a client send me a bad request?" Today, every non-streaming /chat/completions error is logged as 500 Internal Server Error, even when the user just asked for a model that isn't installed (404) or a remote provider is down (503). That mis-classification makes the metrics look like the server is on fire when it's actually a clean client- or config-level failure — exactly the noise PR osaurus-ai#863 set out to remove.

Same effect on the WorkView error classifier: it consumes Insights entries to give users actionable feedback ("install the model" vs "check your network"), and that classifier was already wired up correctly for the wire response — it just never saw the right status from the log.

What's wrong (technical)

The non-streaming handler maps ChatEngine.EngineError to its intended httpStatus and writes that to the wire:

                    let status: HTTPResponseStatus
                    let body: String
                    if let engineError = error as? ChatEngine.EngineError {
                        status = HTTPResponseStatus(statusCode: engineError.httpStatus)
                        body = engineError.errorDescription ?? error.localizedDescription
                    } else {
                        status = .internalServerError
                        body = "Internal error: \(error.localizedDescription)"
                    }
                    ...
                        var responseHead = HTTPResponseHead(version: head.version, status: status)

…but the Insights log call right below hardcoded responseStatus: 500:

                    logSelf.logRequest(
                        ...
                        responseStatus: 500,
                        ...
                    )

Fix

Use Int(status.code) so the log entry matches the wire status. One-line change, zero behaviour impact on actual responses; only Insights observability improves.

Out of scope (intentional): the streaming /chat/completions, /messages, and /responses error paths log 500. Those handlers commit the SSE 200 head before the failure happens or always reply 500 on the wire, so their log status already matches the wire reality. They can be revisited as a follow-up if/when those handlers gain EngineError-aware status mapping.

Changes

  • Behavior change (observability only — wire response is unchanged)
  • UI change
  • Refactor / chore
  • Tests (this is a pure log-string change; an end-to-end Insights test would require wiring InsightsService into HTTP handler tests, which is larger than the fix)
  • Docs

Test Plan

  1. Start the server, point a client at a model id that isn't installed: curl localhost:1234/v1/chat/completions -d '{"model":"nonexistent","messages":[{"role":"user","content":"hi"}]}'.
  2. Wire response should be 404 (already correct).
  3. Open Insights → request should now show 404 instead of 500.
  4. Repeat with a remote provider in a disconnected state → should show 503.

Checklist

  • I have read CONTRIBUTING.md
  • I added/updated tests where reasonable (see "out of scope" above)
  • I updated docs/README as needed (n/a)
  • I verified build on macOS with Xcode 16.4+ (authored in a Linux CI sandbox)
Open in Web Open in Cursor 

Non-streaming /chat/completions correctly maps ChatEngine.EngineError to
its intended HTTP status (404 for unknown model, 503 for unavailable
provider, etc., per PR osaurus-ai#863) when writing the response, but the
Insights log entry always recorded responseStatus: 500.

That makes per-status metrics, the WorkView error classifier, and any
log-driven debugging report blanket 5xx server faults even when the
real failure was a client-side 4xx (e.g. wrong model id). Use the same
status.code we already wrote to the wire so the log entry matches the
HTTP response.

Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
ModelManager.init kicks off an unstructured Task that calls
loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization
listing from Hugging Face and feeds the result through
applyOsaurusOrgFetch.

The unit-test runner repeatedly constructs ModelManager() to drive
applyOsaurusOrgFetch directly. The background launch-time fetch
races with those test calls — whichever finishes last wins, and
the merge result is non-deterministic. That's the root cause of
the flaky ModelManagerSuggestedTests failures seen across many of
the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched
OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.).

Gate the launch-time fetch on a small isRunningInTestEnvironment
helper that checks for any of XCTestConfigurationFilePath,
XCTestBundlePath, or XCTestSessionIdentifier in the process
environment. Those variables are only present inside an xctest host
process; production app launches still get the HF fetch exactly as
before.

This is a network call, so removing it under tests also has the
side benefit of making the test suite work offline / on hermetic
CI runners.

Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants